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A  Structural  Cognitive  Approach  to  the  Assessment 
of  Classroom  Learning 
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The  present  paper  describes  a  method  of  assessing  classroom  knowledge 
that  involves  an  integration  of  psychometric  and  cognitive  perspectives. 

Perhaps  because  of  their  different  interests  these  two  approaches 
historically  have  had  relatively  little  influence  on  one  another.  Whereas 
psychometricians  are  primarily  concerned  with  the  predictiveness  of  a 
measure,  oognitivists  have  been  more  concerned  with  representational  models 
of  knowledge.  In  this  paper  we  hope  to  shew  that  there  exists  a  natural 
synergism  between  the  cognitive  and  psychometric  approaches  that  when 
appropriately  integrated  can  mutually  facilitate  progress  towards  their 
respective  goals.  More  specifically,  the  cognitive  perspective,  with  its 
structural  assumptions  regarding  the  representation  of  knowledge,  can  provide 
the  basis  for  some  new  and  useful  methods  to  assess  classroom  learning.  The 
psychometric  approach,  on  the  other  hand,  with  its  emphasis  on  test  validity 
and  reliability,  can  provide  a  much  needed  empirical  basis  for  models  of 
kncwledge  representation.  e- 

We  begin  this  paper  by  contrasting  the  cognitive  approach  and  the 
psychometric  approach  as  they  are  implemented  in  classroom  assessment.  We 
then  turn  to  a  more  detailed  discussion  of  a  structural  approach  to  kncwledge 
assessment,  which  integrates  the  cognitive  and  psychometric  perspectives 
within  the  context  of  classroom  learning. 

TWo  Contrasting  Perspectives  on  Kncwledge  Assessment 

The  psychometric  approach,  as  applied  in  the  classroom 
setting,  usually  assesses  kncwledge  with  conventional  essay,  true- 
false,  and  multiple  choice  exams.  A  student's  performance  on  this 
type  of  exam  is  usually  represented  in  terms  of  a  percentage 
correct.  Many  educators  are  perhaps  so  familiar  with  this  generic 
form  of  examination  in  their  classes  that  they  no  longer  consider 
the  assumptions  underlying  this  "hew  much"  approach  to  kncwledge 
assessment.  By  accumulating  points  across  questions,  we  are 
assuming  a  kind  of  independence  that  suggests  we  conceptualize 
kncwledge  as  a  list  of  independent  facts  or  elements.  Although  this 
criticism  maybe  less  true  of  essay  exams,  it  remains  the  case  that 
using  a  single  index,  such  as  percentage  correct  tells  us  very 
little  regarding  what  a  student  knews  or  does  not  knew. 

An  simple  list  of  item  may  serve  as  an  appropriate  representation  for 
certain  limited  domains  (e.g. ,  the  capital  cities  for  the  50  states  of  this 
country) ,  but  there  is  a  great  deal  of  empirical  and  theoretical  work 
from  the  cognitive  literature,  suggesting  that  a  list  is  not  a  valid  means  of 
representing  more  complex  domains  of  kncwledge  (e.g.,  Chi,  Glaser,  &  Farr, 

1988;  Genter  &  Collins,  1983).  A  commonly  held  and  long-standing  assumption 
in  cognitive  psychology  is  that  kncwledge  is  organized  and  structured  (Bcwer, 

1975;  Tulving  &  Donaldson,  1972;  Wertheimer,  1945).  From  the  cognitive 
perspective,  to  be  knowledgeable  of  a  domain,  one  must  understand  the 

interrelationships  among  the  important  concepts  within  the  domain.  Consistent  1  _ 

with  this  assumption,  cognitive  models  of  kncwledge  representation  are  a 
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primarily  concerned  with  the  types  of  structures  that  organize  bodies  of 
knowledge.  In  fact,  the  meaning  of  any  specific  concept  is  assumed  to  be 
largely  dependent  on  its  interrelationships  with  other  concepts.  Although 
there  are  a  variety  of  structural  models  of  knowledge  in  the  cognitive 
literature  (e.g.,  Anderson  &  Bower,  1973;  Collins  &  Quillian,  1969),  most 
share  a  central  theme  in  assuming  that  the  interrelations  among  concepts  is 
an  essential  property  of  knowledge. 

As  Shavelson  and  colleagues  (Schavelson,  1972;  Schavelson  &  Stanton, 

1975)  realized  seme  two  decades  ago,  this  assumption  regarding  the 
representation  of  knowledge  has  seme  important  implications  for  the 
assessment  of  classroom  learning.  Basically,  hew  we  assess  knowledge  should 
be  consistent  with  hew  we  assume  knowledge  is  represented .  If  structural 
properties  are  an  important  component  of  knowledge  representation ,  then  our 
assessment  tools  must  measure  these  structural  properties .  Over  the  past  few 
decades,  an  impressive  literature  has  accumulated  indicating  that  the 
structural  properties  of  domain  knowledge  are  closely  related  to  competence 
in  the  domain  (e.g.,  Chase  &  Simon,  1973;  Chi,  Glaser  &  Rees,  1981)$,  From 
this  perspective,  knowledge  of  a  domain  implies  at  some  level  understanding 
hew  the  various  domain  concepts  are  interrelated.  This  view  strongly  suggests 
that  our  methods  of  assessment  must  capture  this  structural  component  of 
knowledge  in  order  to  be  valid. 

An  obvious  implication  is  that  we  should  use  some  type  of  cognitive 
representational  model  to  assess  an  individual's  knowledge  of  a  domain.  In 
the  next  section  we  describe  in  some  detail  how  a  structurally  oriented 
approach  to  knowledge  assessment  can  be  successfully  implemented.  However , 
before  we  conclude  this  section  we  need  to  discuss  hew  the  structural 
assessment  approach  is  mutually  beneficial  to  the  cognitive  approach  and  the 
psychometric  approach  as  it  is  applied  in  the  classroom.  Its  potential 
benefits  to  the  psychometric  approach  are  twofold.  First,  it  would  more 
solidly  ground  classroom  evaluation  in  a  context  of  knowledge  representation 
theory.  Secondly,  if  structural  aspects  of  knowledge  are  related  to  domain 
performance,  the  assessment  of  these  structural  properties  should  improve 
prediction.  Finally,  as  will  be  discussed  in  sane  detail  later,  the 
representation  may  be  presented  in  the  form  of  a  visual  graph  that  allows  the 
instructor  to  more  easily  identify  the  locus  of  a  student's  misconceptions 
regarding  the  domain.  This  in  turn  could  facilitate  individualized  training 
intervention . 

One  benefit  of  a  structural  approach  to  assessment  for  cognitive 
theory  is  that  it  provides  an  empirical  basis  for  evaluating  different 
representational  models  of  knowledge.  This  type  of  representational 
validation  has  been  largely  lacking  in  the 

cognitive  literature.  As  will  became  apparent  when  we  describe  the 
implementation  of  the  structural  approach,  the  structural  representations 
are  evaluated  in  terms  of  their  ability  to  predict  classroom  exam 
performance.  In  other  words,  each  student  will  have  her  unique,  empirically 
derived  representation  of  a  knowledge  domain.  Thus,  predictive  validity  plays 
a  central  role  choosing  a  theoretical  representation  of  dona  in  knowledge. 

This  stands  in  contrast  to  the  methods  by  which  most  cognitive 

represesntational  models  are  validated.  Cognitivists  have 

been  far  more  concerned  with  issues  relating  to  the  architecture  of  their 
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models  of  semantic  memory  and  knowledge  representation .  Among  other  things, 
these  models  attempt  to  capture  the  way  we  rapidly  access  and  retrieve 
various  bits  of  information  from  memory.  Experiments  designed  to  test  these 
models  often  look  at  how  stimulus  parameters  (e.g. ,  word  length)  influence 
response  latencies.  Ihe  models  are  intended  to  apply  to  large  populations 
(e.g.,  native  English  speaking  adults),  or  specific  groups  (e.g.,  expert 
programmers) ,  with  little  or  no  interest  in  individual  differences. 

In  summary,  our  aim  is  to  build  sane  bridges  between  applied  educational 
testing  and  cognitive  theories  of  knowledge  representation .  We  believe  the 
schism  between  the  two  fields  is  unnecessary  and  counterproductive.  It 
developed,  we  believe,  primarily  out  of  their  different  interests.  Ihe 
cognitivists  were  concerned  with  the  development  of  models  of  cognitive 
representational  systems,  whereas  the  educational  assessment  researchers  were 
more  concerned  with  the  immediate  issues  of  validity  and  reliability. 

Indeed,  there  exists  a  natural  synergism  between  the  two  fields  that  could  be 
mutually  beneficial  to  the  progress  of  both.  Specifically,  we  hope  to  shew 
that  test  theorists'  concerns  with  predictiveness  will  benefit  modeling  of 
cognitive  structure,  and  the  cognitivists'  structural  perspective  will 
positively  influence  the  development  of  the  methods  used  to  assess  domain 
knowledge. 


Structural  Assessment:  Methods  and  Findings 

In  this  section  we  provide  a  general  methodological  overview  of 
structural  approaches  to  knowledge  assessment,  with  special  emphasis  on 
methods  we  have  developed  over  the  past  few  years.  Although  not  a 
comprehensive  review  of  the  literature,  the  discussion  should  give  the  reader 
a  basic  understanding  of  the  structural  approach,  hew  it  differs  from  more 
conventional  testing  approaches,  a  smattering  of  relevant  findings,  and  seme 
of  the  more  important  issues  and  implications  viewed  from  the  structural 
perspective. 

Research  on  structural  knowledge  assessment  in  classrooms  began  to 
appear,  primarily  in  educational  psychology  journals,  in  the  late  1960 's  and 
early  1970's  (e.g.,  Johnson,  1967;  1969;  Kass,  1971;  Shavelson,  1972; 
Shavelson  &  Stanton,  1975) .  Several  investigators  reported  encouraging 
findings,  indicating  that  classroom  performance  was  related  to  students' 
structural  organization  of  the  central  concepts  in  the  course.  For  example, 
Fenker  (1975)  had  students  in  a  measurement  class  and  a  design  class  rate  the 
relatedness  of  pairs  of  concepts  and  then  transformed  their  ratings  to  an  MDS 
spatial  representation.  The  students'  MDS  representations  were  then  compared 
with  a  referent  representation  based  on  the  average  ratings  of  eight  experts 
in  each  domain.  He  found  that  students'  similarity  to  the  referent  structure 
was  correlated  (r=. 54)  with  course  grades  in  the  design  course,  and  (r=.61) 
with  grades  in  the  measurement  course.  Despite  the  generally  positive  outcome 
of  this  early  work,  there  were  a  number  of  specific  methodological  problems 
that  hampered  further  advances.  Perhaps  foremost  was  the  lack  of 
quantitative  methods  for  evaluating  structural  representations.  We  believe 
that  our  current  research  has  made  significant  progress  in  addressing  these 
issues. 
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Our  discussion  of  structural  assessment  methods  is  organized  in  terms  of 
the  three  major  steps  that  are  involved  in  their  implementation :  (a) 
elicitation  -  evoking  seme  behavioral  index  of  an  individual's  organization 
of  domain  concepts;  (b)  representation  -  applying  techniques  that  transform 
the  elicited  data  into  a  representation  that  captures  the  important  structural 
properties  of  domain  knowledge;  and  (c)  evaluation  -  quantifying  the  level  of 
knowledge  or  sophistication  that  is  reflected  in  the  representation . 

Elicitation 

Elicitation,  as  the  word  suggests,  is  the  process  of  evoking  or 
extracting  what  a  person  knows  about  seme  knowledge  domain.  There  are  a  wide 
range  of  methods  for  eliciting  knowledge,  ranging  from  direct  approaches ,  such 
as  interviews  and  conventional  essay  exams,  to  more  indirect  approaches  where, 
for  example,  knowledge  may  be  inferred  on  the  basis  of  reac±ion  times  (e.g. , 
Collins  &  Quillian,  1969) . 

One  important  point  about  elicitation  is  that  the  method  of  elicitation 
should  be  compatible  with  the  cognitive  model  of  knowledge  representation . 

Thus,  if  it  is  assumed  that  knowledge  is  structural  in  its  representation,  it 
follows  that  the  elicited  behavior  should  be  sensitive  to  the 
interrelationships  among  the  concepts.  The  implications  of  this  assertion 
will  be  better  appreciated  after  we  have  discussed  the  elicitation, 
representation,  and  evaluation  phases  of  the  structural  approach. 

For  the  present,  it  suffices  to  say  that  the  elicitation  procedure  must 
provide  some  indication  of  the  relatedness  between  pairs  of  concepts.  With 
an  appropriate  representational  transformation  of  these  relatedness  ratings 
it  should  be  possible  to  capture  more  global  structural  properties  of  domain 
knowledge. 

Although  a  variety  of  elicitation  methods  have  been  used  to  obtain 
concept  relationships,  including  word  associations  (Johnson,  1967) ,  ordered 
recall  (Cooke,  Durso,  &  Schvaneveldt,  1986) ,  and  card  sorting  (Shavelson  & 
Stanton,  1975) ,  simply  having  subjects  make  subjective  ratings  of  degree  of 
relatedness  between  pairs  of  concepts  works  quite  well  in  assessing  an 
individual's  knowledge  of  the  interrelations  among  domain  concepts  (Fenker, 
1975;  Goldsmith,  Johnson,  &  Acton,  1991) .  Furthermore,  there  may  be  certain 
advantages  to  using  relatedness  ratings  to  elicit  domain  knowledge.  First, 
subjects  have  no  difficulty  using  a  numerical  scale  to  express  their  sense  of 
relatedness.  As  a  result,  it  is  relatively  simple  to  automate  the 
administration  and  scoring  of  the  ratings.  This  allows  for  the  objective  and 
efficient  gathering  of  large  amounts  of  relatedness  data.  Second,  unlike  essay 
exams  and  interviews,  relatedness  ratings  do  not  assume  that  subjects  have 
conscious  access  to  all  relevant  knowledge.  In  fact,  in  our  own  work  we  have 
found  that  requiring  subjects  to  make  rapid  relatedness  judgments  on  the  basis 
of  their  initial  intuitions  may  result  in  more  reliable  and  valid  ratings  than 
allowing  unlimited  time. 

TWo  questions  about  concept  selection  inevitably  arise  when  using 
relatedness  judgments  to  assess  domain  knowledge,  namely,  hew  many  and  which 
concepts  should  be  rated?  Not  surprisingly,  these  two  questions  are  closely 
related,  since  the  number  of  concepts  required  to  obtain  a  valid  assessment 
is  likely  to  depend  on  how  the  concepts  are  selected. 


In  deciding  on  the  number  of  concepts  to  be  rated  we  must  consider  how 
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the  number  of  concepts  influences  the  total  number  of  pairs  that  are  rated. 

At  the  extremes  each  concept  could  be  paired  with  one  or  all  other  concepts 
in  the  list.  Because  some  structural  methods  of  analyzing  ratings  require 
that  data  be  collected  on  all  pairwise  combinations  of  concepts  (e.g. , 
Pathfinder,  Schvaneveldt,  1990) ,  we  will  focus  the  discussion  on  this  case. 
When  all  pairwise  combinations  of  concepts  are  rated  for  n  concepts,  there 
will  be  [n(n  -  l)/2]  pairwise  ratings.  For  example,  24  concepts  would  result 
in  276  pairs,  which  requires  approximately  45  minutes  for  most  students  to 
complete.  For  practical  considerations,  including  attention  span  and  fatigue, 
this  sets  an  upper  limit  of  approximately  30  concepts  we  can  expect  students 
to  rate  in  a  single  session. 

In  one  study  (Goldsmith,  Johnson,  &  Acton,  1991)  involving  an 
undergraduate  course  in  design  of  experiments,  we  found  that  when  students 
rated  all  pairwise  combinations  of  concepts,  predictiveness  of  course 
performance  improved  in  a  linear  manner  from  .15  to  .74  as  the  number  of 
concepts  rated  increased  from  5  to  30.  Although  this  suggests  that  more  is 
better,  we  have  found  with  24  concepts  predictions  of  college  classroom  course 
performance  ranged  from  approximately  .50  to  .85  across  several  different 
domains  (cognitive  psychology,  computer  programing,  and  design  of 
experiments) . 

We  turn  next  to  the  question  of  hew  concepts  are  selected.  We  first 
attempted  to  generate  a  fairly  comprehensive  list  of  the  important  concepts  in 
a  subject  by  analyzing  the  glossary  and  index  of  relevant  textbooks.  We  then 
conferred  with  the  course  instructor,  to  add  any  important  concepts  that  were 
missing.  From  this  list  we  selected  a  sample  of  concepts  (usually  24)  that 
the  instructor  agreed  were  representative  of  the  course  material. 

Considerable  work  is  left  to  be  done  on  developing  a  set  of  criteria  to 
serve  as  a  systematic  basis  for  selecting  concepts.  One  obvious  criterion 
proposed  by  Hirsch  (1987)  and  Boneau  (1990)  is  the  concept's  importance  to  the 
domain,  as  judged  by  experts.  Being  knowledgeable  of  the  most  important 
concepts  within  a  domain  may  be  sufficient  if  our  only  goal  is  to  define  some 
basic  level  of  competence,  but  these  concepts  may  not  adequately  discriminate 
among  higher  levels  of  expertise.  Thus,  another  basis  for  selection  would  be 
to  select  those  concepts  which  best  discriminate  between  levels  of  expertise. 

Selecting  concepts  on  the  basis  of  their  correlation  with  exam  scores  is 
similar  to  the  item  selection  procedure  commonly  used  in  test  construction 
(Anastasi,  1988) .  When  this  procedure  is  used  in  test  development  it  applies 
to  specific  items,  whereas  in  the  rating  task  the  selection  of  a  concept  would 
imply  that  it  would  be  paired  with  the  other  n-1  concepts.  Thus,  item 
selection  may  be  more  efficiently  applied  to  pairs  of  concepts  than  individual 
concepts. 

Recently,  we  have  found  (Goldsmith  &  Johnson,  1990)  that  by  selecting  the 
more  predictive  pairs,  it  is  possible  to  predict  classroom  exam  performance  as 
well  with  ratings  of  100  or  fewer  selected  pairs,  as  with  all  276  pairwise 
combinations  of  24  concepts.  Simply  in  terms  of  prediction  there 
appear  to  be  obvious  benefits  to  employing  an  item  selection  procedure. 
However,  there  is  a  cost  when  it  comes  to  transforming  the  ratings  into  a 
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structural  representation .  This  will  became  more  apparent  in  the  next 
section,  where  we  discuss  the  representation  of  the  elicited  knowledge. 

Representation 

Once  we  have  elicited  an  individual's  concept  interrelationships  in  a 
domain,  we  must  decide  how  to  transform  these  raw  proximities  into  a 
representation  that  best  models  the  individual's  knowledge.  We  mention  three 
important  criteria  in  choosing  a  representation .  First,  the  representation 
should  have  acceptable  predictive  validity.  That  is,  we  should  be  able  to 
predict  an  individual's  level  of  competence  in  a  domain  at  least  as  well  with 
the  representation  as  with  the  untransformed  ratings. 

Second,  the  representation  should  be  easily  comprehended.  One  advantage 
of  many  scaling  algorithms  is  that  they  result  in  visual  representations 
depicting  the  organization  among  concepts  in  a  manner  that  is  relatively 
easily  interpreted.  For  example,  cluster  analysis  represents  the  concepts 
organized  in  terms  of  a  hierarchical  graph  (Johnson,  1967;  Milligarr  &  Cooper, 
1987) .  Thus  one  can  see  by  visual  examination  hew  an  individual  organizes  the 
concepts  within  a  domain. 

Finally,  the  representation  should  be  consistent  with  our  theoretical 
conceptions  of  knowledge.  In  the  case  of  conventional  exams  we  often  simply 
use  the  percentage  correct  to  represent  what  an  individual  knows  about  some 
domain.  As  argued  above,  this  method  suggests  that  knowledge  can  be 
conceptualized  as  an  accumulation  of  independent  facts.  A  percentage  index 
estimates  the  proportion  of  information  known.  Although  the  information  may 
actually  involve  understanding  certain  conceptual  relationships,  a  percentage 
does  not  explicitly  reflect  the  structural  properties  of  the  individual's 
knowledge. 

The  next  question  is  to  determine  which  type  of  representation  better 
models  the  specific  structural  property  that  is  assumed  to  be  important.  There 
are  a  variety  of  scaling  procedures  that  researchers  have  historically  used  to 
infer  the  structural  organization  underlying  similarity  judgments.  One  of  the 
more  frequently  used  methods  is  multidimensional  scaling  (MDS)  (e.g. ,  Kruskal, 
1964) ,  which  represents  a  set  of  concepts  in  terms  of  an  n-dimensional 
Euclidean  space.  Other  scaling  algorithms  such  as  cluster  analysis  (e.g. , 
Johnson  1967)  and  additive  trees  (Sattath  &  Tversky,  1977)  result  in 
hierarchical  graph  representations .  A  more  recently  developed  scaling 
algorithm.  Pathfinder  (Schvaneveldt,  1990)  also  organizes  the  concepts  into  a 
connected  graph  representation,  but  Pathfinder  does  not  impose  a  hierarchical 
solution  and  thereby  allows  greater  freedom  in  developing  an  individual's 
structural  graph. 

To  provide  a  concrete  illustration  of  a  Pathfinder  network.  Figures  1 
and  2  shew  Pathfinder  solutions  for  an  expert's  and  a  novice's  ratings  of 
24  concepts  from  a  cognition  and  memory  course.  Those  readers  having  some 
background  in  cognitive  psychology  will  see  that,  while  seme  of  the  novice's 
structure  is  quite  reasonable,  it  reveals  a  number  of  either  missing  or 
inappropriate  relationships. 
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Figure  1  Pathfinder  network  solution  to  expert's  ratings  of  24 
concepts  from  course  on  cognition  and  memory. 
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Figure  2.  Pathfinder  network  solution  to  undergraduate  student's 
ratings  of  24  concepts  at  end  of  semester. 
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In  choosing  a  type  of  representation,  all  of  the  above  criteria  must  be 
considered.  If  the  research  is  theoretically  motivated  the  theory  will  suggest 
the  structural  properties  that  are  of  primary  interest,  and  this  will 
likely  favor  one  representational  approach  over  others.  For  example,  there  is 
evidence  (Holman,  1972;  Pruzansky,  Tversky,  &  Carroll,  1982)  suggesting  that 
spatial  representations ,  such  as  MDS,  work  better  for  perceptual  phenomenon 
(e.g. ,  color  represented  in  terms  of  a  three  dimensional  space  involving  hue, 
saturation ,  and  brightness) ,  whereas  network  r epr esenta t ions  are  better  for 
conceptual  phenomena  (e.g.,  a  biological  taxonomy  of  animal  species) . 

If,  on  the  other  hand,  the  research  has  a  more  applied  orientation  then 
ease  of  representation  may  play  a  more  important  role.  For  example,  assume 
the  goal  is  to  design  an  individualized  curriculum  that  is  aimed  at  addressing 
specific  knowledge  deficits  within  a  domain.  This  process  could  be 
facilitated  with  the  use  of  network  representations ,  such  as  those  presented 
in  Figure  1.  By  visually  examining  student  and  expert  networks,  it  could 
be  determined  which  specific  clusters  or  connections  were  missing  from  an 
individual  student's  organization  of  a  domain. 

Finally,  the  choice  of  representation  can  be  based  on  predictiveness. 

Using  this  criterion,  the  type  of  representation  that  provides  the  best 
prediction  of  domain  competence  is  preferred.  We  believe  that  the 
predictiveness  criterion,  if  used  in  moderation,  could  have  a  healthy 
influence  on  the  theoretical  development  of  cognitive  representations  by 
forcing  the  represen  tat  ions  to  make  more  fine-grained  distinctions.  Many 
models  of  knowledge  represen  tat  ion  (e.g.,  Collins  &  Quillian,  1969)  are  able 
to  make  very  general  predictions  regarding  the  organization  of  knowledge 
(e.g.,  the  attribute  of  singing  is  more  closely  related  to  canaries  than  is 
the  attribute  of  eating) ,  but  they  fail  to  address  individual  differences  in 
domain  competence. 

There  is  a  danger  of  overemphasiz ing  predictabii  it/  as  a  basis  for 
favoring  a  particular  representational  transformation.  On  first  consideration 
it  may  appear  that  predictability  is  a  completely  objective  basis  of 
evaluating  the  validity  of  alternative  representations.  This  assumption, 
however,  is  only  true  to  the  extent  that  the  external  criterion  that  is  being 
predicted  is  an  objective  definition  of  competence.  In  the  case  of  our  own 
work  we  have  been  using  course  points  from  classroom  exams  as  the  external 
criterion.  At  some  point  we  must  ask  ourselves  if  we  would  be  happy  if  our 
structural  measure  correlated  perfectly  with  exam  scores.  Obviously  not.  The 
point  is,  we  doubt  the  ultimate  validity  of  conventional  exams,  but  we  must 
use  them  as  a  means  of  bootstrapping  a  new  alternative.  The  eventual 
acceptance  of  a  structural  approach  to  assessment  will  rest  upon  a  multitude 
of  criteria.  Thus,  the  overemphasis  on  a  single  criterion  at  this  early 
juncture  is  likely  to  be  misguided. 

In  concluding  our  discussion  of  knowledge  representations,  it  should  be 
apparent  that  research  and  theory  in  this  field  is  still  in  its  infancy.  It 
is  far  too  early  to  exclude  alternative  representational  systems  from  further 
consideration  on  the  basis  of  the  preliminary  data  that  is  currently 
available.  We  are  proposing  a  broad  scale  program  of  research  in  which 
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different  investigators  will  explore  a  variety  of  methods  and  applications. 

The  problems  are  sufficiently  complex  to  accommodate  more  than  a  single  model. 

Evaluation 

The  third  step  in  knowledge  assessment  is  to  evaluate  an  individual's 
knowl  edge  representation .  What  level  of  sophistication  or  competence  is 
indicated  by  a  particular  representation?  Clearly,  we  must  have  seme  means  of 
transforming  a  representation  into  a  simple  index  of  competence.  We  will 
discuss  two  fundamentally  different  methods  of  evaluation.  One  approach  we 
call  referent-based,  in  which  the  student's  representation  is  compared  against 
some  external  standard.  In  referent-based  evaluation  some  index  of  similarity 
between  the  student  and  expert  referent  representation  is  used  to  predict 
domain  competence  (e.g. ,  classroom  exam  performance) .  The  other  approach  to 
evaluation  is  referent  free  in  that  the  assessment  refers  to  intrinsic 
properties  of  the  student  representation. 

Referent  Based  Evaluations.  When  attempting  to  assess  domain  competence, 
the  most  obvious  external  standard  is  an  expert  or  group  of  experts  in  the 
field  (Chi,  Feltovich,  &  Glaser,  1981) .  In  our  work,  when  assessing  college 
classroom  knowledge,  course  instructors  naturally  serve  as  experts.  Often  we 
have  averaged  the  instructor's  ratings  with  a  number  of  other  faculty  and 
graduate  students  who  have  taught  similar  courses.  We  find  that  a  referent 
structure  based  on  the  averaged  ratings  of  a  number  of  experts  is  usually  a 
better  predictor  of  exam  scores  than  one  based  only  on  the  ratings  of  the 
individual  instructor  for  the  course  (Acton,  1990) .  This  finding  has  seme 
important  implications.  Specifically,  it  allows  for  the  possibility  of  moving 
towards  an  idealized  referent  structure  that  transcends  the  various 
idiosyncrasies  of  individual  experts.  We  must  emphasize  that  the  idea  of  an 
idealized  referent  structure  does  not  in  any  way  constrain  individual 
creativity.  The  fact  is,  although  expert  structures  are  more  similar  to  one 
another  than  novice  structures,  each  expert's  organization  has  unique 
characteristics . 

Precisely  hew  the  comparison  between  student  and  expert  representation  is 
carried  out  depends,  in  part,  on  the  type  of  representation  being  compared. 

To  begin,  we  can  take  the  relatedness  ratings  matrix  itself  as  a  raw 
representation  of  an  individual's  knowledge.  The  most  obvious  and  direct  way 
to  assess  the  similarity  between  two  proximity  matrices  is  simply  to  compute 
the  correlation  between  the  two  sets  of  ratings.  We  have  found  this  measure 
of  similarity  to  be  a  good  predictor  of  classroom  exam  performance  with 
correlations  between  similarity  and  total  points  on  exams  ranging  from  .45  to 
.83  across  different  semesters  and  different  courses. 

Although  the  correlations  on  raw  ratings  may  perform  quite  well  as  a 
predictor,  it  does  not  fare  well  on  the  other  two  criteria  by  which  we 
evaluate  representations.  First,  a  matrix  of  ratings  is  not  easily 
comprehended,  and  second,  it  is  not  motivated  from  any  explicit  theoretical 
perspective.  If  we  adopt  a  structural  approach,  we  want  to  look  at 
representations  and  methods  of  comparing  representations  that  emphasize 
structural  properties.  Recall  that  our  definition  of  structure  focused  on  the 
interrelationships  among  concepts,  which  we  believe  is  best  captu*  ad  by 
network  representations.  We  also  hypothesized  that  the  meaning  of  an 
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individual  concept  is  defined  in  terms  of  the  concepts  that  are  closely 
related  to  it.  This  has  sane  important  duplications  for  hew  we  evaluate  the 
similarity  between  two  networks. 

When  evaluating  Pathfinder  derived  network  representations,  it  is  quite 
possible  to  quantify  the  similarity  between  a  student  and  expert  network  graph 
by  sinply  correlating  the  graph  distances  between  respective  pairs  of 
concepts.  However,  this  correlational  measure  of  similarity  does  not  capture 
the  more  global  properties  of  our  definition  of  structure  (viz. ,  a  concept 
which  is  defined  by  its  neighbors) .  TO  overcome  this  limitation,  we 
developed  (Goldsmith  &  Davenport,  1990)  a  set  theoretic  measure  called  C  that 
reflects  the  similarity  in  neighborhoods  between  two  concepts.  For  exanple, 
assume  that  concept  A  in  a  student's  network  is  directly  linked  to  concepts 
B,  C,  and  D,  whereas  concept  A  in  the  expert's  network  is  linked  to  concepts  B 
and  C.  The  measure  C  is  the  ratio  of  the  size  of  the  intersection  (B  and  C) 
over  the  size  of  the  union  (B,  C,  and  D)  or  .67.  We  do  this  for  each  concept 
and  then  simply  average  the  ratios  over  all  the  concepts.  We  have  found  the 
similarity  measure  C  of  Pathfinder  networks  to  be  a  better  predictor  of  exam 
scores  than  correlational  measures  on  raw  proximity  data,  network  distances, 
or  Euclidean  distances  derived  from  MDS  scaling  (Goldsmith,  Johnson,  &  Acton, 
1991) . 

The  point  is  not  that  using  C  on  Pathfinder  networks  was  necessarily  a 
better  predictor,  but  that  our  methods  of  assessment  are  consistent  with  our 
view  of  domain  knowledge.  It  is  quite  possible  that  other  measures  and  other 
domains  may  yield  different  outcomes.  Although  we  expect  that  methods 
emphasizing  structural  properties  of  knowledge  will  generally  do  a  better  job 
of  assessing  domain  knowledge,  the  important  point  is  for  researchers  and 
practioners  to  adopt  a  coherent  and  theoretically  principled  approach  to 
assessment. 

Referent  Free  Assessment.  Most  methods  for  evaluating  domain  knowledge 
involve  an  external  criterion  or  referent.  For  example,  in  conventional 
testing  there  is  the  externally  defined  "correct  answer"  against  which 
performance  is  evaluated.  In  contrast,  we  might  look  for  intrinsic  properties 
of  behavior  that  are  indicative  of  expertise.  Once  again,  the  specific 
intrinsic  properties  we  look  for  should  be  consistent  with  our  theoretical 
conceptions  of  domain  knowledge. 

In  our  structural  approach  to  knowledge  assessment  we  have  assumed  that  a 
concept's  meaning  is  contained  in  its  relationships  to  other  concepts  (i.e. , 
its  neighbors)  within  the  domain.  Therefore,  if  concepts  A  and  B  are 
neighbors,  and  concepts  B  and  C  are  neighbors,  there  is  an  increased 
likelihood  that  concepts  A  and  C  are  also  neighbors.  As  an  individual  becomes 
more  knowledgeable  we  would  expect  her  judgments  of  relatedness  to  become  more 
constrained  by  these  neighborhood  factors.  Hew  might  one  go  about  quantifying 
this  type  of  constraint?  Cur  approach  is  to  first,  use  the  C  measure 
described  above  to  compute  a  derived  distance  between  all  pairs  of  concepts  on 
the  basis  of  neighborhood  similarity.  Next,  we  compute  the  correlation 
between  the  raw  ratings  and  the  derived  ratings  for  all  pairs  of  concepts.  We 
call  this  measure  coherence.  We  have  found  coherence  to  be  a  reliable 
predictor  of  student's  classroom  knowledge.  In  addition,  coherence  increases 
across  levels  of  expertise  ranging  from  naive  student  to  knowledgeable 
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undergraduate  to  graduate  student  to  professor  (Acton,  1990) . 

Another  type  of  referent  free  property  of  relatedness  ratings  is  the 
consistency  with  which  repeated  pairs  of  concepts  are  rated.  In  cur  rating 
task  we  usually  repeat  approximately  10%  of  the  pairs,  and  then  ccrrpute  the 
correlation  between  repeated  ratings  for  each  individual.  We  find  that  this 
index  of  reliability  is  significantly  correlated  with  exam  performance.  Not 
surprisingly,  it  is  easier  to  be  consistent  when  you  are  knowledgeable  of  the 
concepts  you  are  rating. 

To  summarize,  we  have  proposed  two  methods  of  evaluation,  referent  based 
and  referent  free.  In  the  case  of  referent  based  evaluation  we  noted  the 
advantages  of  using  expert  referent  representations  based  on  the  averaged 
ratings  of  several  experts  and  alternative  methods  of  quantifying  the 
similarity  between  two  representations .  In  our  discussion  of  referent  free 
methods  we  introduced  the  measure  of  coherence,  which  reflects  internal 
consistency  of  the  ratings.  It  was  noted  that  reliability  may  also  be  used  as 
a  referent  free  evaluation.  The  ideal  "good"  student  is  realized  vfcen  all 
three  measures  (C,  coherence,  and  reliability)  are  high. 

Implications  for  Curriculum  Design  and  Instruction 

The  value  of  assessment  is  contained  in  hew  it  is  used.  If  it  goes  no 
further  than  informing  a  student  that  she  is  in  the  bottom  quartile  of  the 
class  it  is  of  little  constructive  value.  Therefore,  it  is  appropriate  to 
consider  some  of  the  important  implications  of  the  structural  approach  for  the 
design  of  curriculum  and  methods  of  instruction. 

Because  the  structural  approach  that  we  have  proposed  involves  a 
comparison  between  student  and  expert  network  representations,  it  permits  the 
identification  of  organizational  differences  at  any  level  of  detail.  We  can  go 
from  looking  for  the  presence  or  absence  of  specific  links  between  concepts, 

to  looking  at  more  global  organizational  properties  of  the  two  networks.  This 
offers  the  possibility  of  providing  students  with  extremely  comprehensive 
feedback,  however,  it  raises  the  question  of  hew  the  feedback  is  to  be  used. 
More  to  the  point,  what  are  the  instructional  implications  for  differences 
between  student  and  expert  networks? 

On  the  one  hand,  it  is  relevant  to  knew  that  a  majority  of  students  in 
your  class  do  not  see  the  relationship  among  a  certain  cluster  of  concepts  on 
which  you  have  just  completed  lecturing.  Clearly,  it  is  important  to  have 
identified  this  subset  of  students,  but  given  this  information,  what  do  you  do 
about  the  apparent  deficit  in  their  knowledge?  It  is  unlikely  that  the  deficit 
can  be  corrected  by  simply  informing  the  students  that  concepts  A,  B,  C,  and  D 
are  all  closely  related.  Presumably  they  need  more  information  chi  hew  these 
concepts  are  interrelated,  and  when  that  information  is  provided  in  an 
appropriate  manner  we  will  see  the  changes  in  their  network  representations . 
Some  support  for  this  is  provided  in  a  study  by  Brcwn  and  Stanners  (1983) . 

They  shewed  that  an  MDS  representation  of  a  student's  organization  of  concepts 
in  an  introductory  psychology  class  could  be  modified  by  focused  training  on  a 
small  subset  of  concepts.  The  training  involved  having  students  make  the 
rating  judgments,  then  publicly  defend  their  rating  to  the  class  and  the 
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instructor.  In  sane  instances  the  instructor  would  then  spend  several  minutes 
discussing  the  relationship  between  specific  pairs  of  concepts. 

Another  potential  advantage  of  adopting  a  cognitive  structural  approach 
to  assessment  is  that  the  students  can  be  given  an  objective  goal  that  has 
face  validity  and  is  theoretically  grounded.  Moreover,  the  referent  structure 
itself,  represented  as  a  graphic  network  of  interconnected  concepts,  can 
serve  as  a  type  of  organizational  schema  for  readings  and  lectures.  Unlike 
the  conventional  outline  that  forces  a  linear  organization,  a  network 
structure  can  explicitly  represent  all  the  important  relationships  that  need 
to  be  grasped.  With  computer  software  environments  such  as  hypertext  it 
would  be  possible  to  implement  the  empirically  derived  structure  of  experts 
within  a  domain  (Jonassen,  1988) .  This  would  allow  for  intelligent  nonlinear 
browsing  through  the  domain  by  novices. 

General  Conclusion  and  Summary 

Our  primary  motivation  in  writing  the  paper  was  to  facilitate 
communication  between  traditional  test  theory  and  cognitive  theory  .r The 
central  theme  addressed  the  relation  between  how  knowledge  is  represented  and 
hew  it  is  assessed.  If  our  representation  of  knowledge  is  organized  or 
structured  then  our  assessment  of  knowledge  must  capture  this  structure  and 
our  instruction  must  reflect  the  structure.  We  then  outlined  hew  a  structural 
approach  to  assessment  could  be  implemented  and  summarized  seme  of  the 
encouraging  findings  in  the  area. 

In  closing,  we  quickly  summarize  seme  of  the  advantages  of  the  structural 
approach  to  assessment.  First,  a  most  basic  requirement  of  any  assessment 
technique  is  that  it  can  be  applied  to  individuals,  as  can  be  done  with  the 
structural  approach.  Second,  the  administration  and  scoring  are  completely 
objective  and  efficient.  Once  the  concepts  or  pairs  have  been  selected  the 
entire  process  can  be  easily  automated  on  computers.  In  regard  to  ease  of 
administration  it  should  also  be  noted  that  the  program  that  presents  the 
pairs  always  randomizes  the  order  of  presentation  for  each  subject,  thus 
minimizing  order  effects  and  the  risk  of  cheating  when  administered  in  groups. 
Also,  it  is  a  simple  matter  to  create  multiple  versions  of  the  rating  task  by 
changing  a  proportion  of  the  concepts  that  are  paired.  This,  of  course,  allcws 
repeated  administrations  of  the  task  over  the  duration  of  a  course,  which 
would  provide  a  picture  of  structural  change  as  learning  progresses.  Third, 
although  the  knowledge  that  directs  our  judgments  of  relatedness  is  sometimes 
entirely  explicit,  it  appears,  on  the  basis  of  students'  introspections ,  that 
the  judgments  are  often  intuitively  based  and  dependent  on  implicit  knowledge. 
In  this  regard  the  approach  may  nicely  complement  some  conventional  exams 
(e.g. ,  essay)  that  depend  more  on  explicit  knowledge.  Fourth,  the  results  not 
only  indicate  hew  much  a  student  knows  (e.g. ,  relative  similarity  to  an 
expert  referent  structure) ,  but  also  what  specific  relationships  are 
misunderstood,  and  whether  the  individual  is  internally  consistent 
(i.e. , coherent)  in  her  judgments  of  relatedness.  Fifth,  and  most  important 
in  our  opinion,  the  entire  process,  involving  both  training  and  assessment, 
is  grounded  in  a  common  theoretical  framework.  This  should  foster  greater 
communication  and  compatibility  between  the  historically  distant  areas  of 
psychometric  assessment  and  cognitive  theories  of  representation .  Both  should 
benefit  from  this  common  orientation. 
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